Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

نویسندگان

چکیده

Offline reinforcement learning—learning a policy from batch of data—is known to be hard for general MDPs. These results motivate the need look at specific classes MDPs where offline learning might feasible. In this work, we explore restricted class obtain guarantees learning. The key property, which call Action Impact Regularity (AIR), is that actions primarily impact part state (an endogenous component) and have limited on remaining exogenous component). AIR strong assumption, but it nonetheless holds in number real-world domains including financial markets. We discuss algorithms exploit provide theoretical analysis an algorithm based Fitted-Q Iteration. Finally, demonstrate outperforms existing across different data collection policies simulated real world environments regularity holds.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning State and Action Hierarchies for Reinforcement Learning Using Autonomous Subgoal Discovery and Action-Dependent State Space Partitioning

This paper presents a new method for the autonomous construction of hierarchical action and state representations in reinforcement learning, aimed at accelerating learning and extending the scope of such systems. In this approach, the agent uses information acquired while learning one task to discover subgoals for similar tasks. The agent is able to transfer knowledge to subsequent tasks and to...

متن کامل

Generating Hierarchical Structure in Reinforcement Learning from State Variables

This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a “nested Markov ordering”. The benefits of ...

متن کامل

Reinforcement Learning in Continuous State and Action Spaces

Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to ...

متن کامل

Offline Evaluation of Online Reinforcement Learning Algorithms

In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we ...

متن کامل

Reinforcement Learning in Continuous State and Action Space

To solve complex navigation tasks, autonomous agents such as rats or mobile robots often employ spatial representations. These “maps” can be used for localisation and navigation. We propose a model for spatial learning and navigation based on reinforcement learning. The state space is represented by a population of hippocampal place cells whereas a large number of locomotor neurons in nucleus a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Artificial Intelligence Research

سال: 2023

ISSN: ['1076-9757', '1943-5037']

DOI: https://doi.org/10.1613/jair.1.14580